9 research outputs found
Hercules Against Data Series Similarity Search
We propose Hercules, a parallel tree-based technique for exact similarity
search on massive disk-based data series collections. We present novel index
construction and query answering algorithms that leverage different
summarization techniques, carefully schedule costly operations, optimize memory
and disk accesses, and exploit the multi-threading and SIMD capabilities of
modern hardware to perform CPU-intensive calculations. We demonstrate the
superiority and robustness of Hercules with an extensive experimental
evaluation against state-of-the-art techniques, using many synthetic and real
datasets, and query workloads of varying difficulty. The results show that
Hercules performs up to one order of magnitude faster than the best competitor
(which is not always the same). Moreover, Hercules is the only index that
outperforms the optimized scan on all scenarios, including the hard query
workloads on disk-based datasets. This paper was published in the Proceedings
of the VLDB Endowment, Volume 15, Number 10, June 2022
ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees
Existing systems dealing with the increasing volume of data series cannot
guarantee interactive response times, even for fundamental tasks such as
similarity search. Therefore, it is necessary to develop analytic approaches
that support exploration and decision making by providing progressive results,
before the final and exact ones have been computed. Prior works lack both
efficiency and accuracy when applied to large-scale data series collections. We
present and experimentally evaluate ProS, a new probabilistic learning-based
method that provides quality guarantees for progressive Nearest Neighbor (NN)
query answering. We develop our method for k-NN queries and demonstrate how it
can be applied with the two most popular distance measures, namely, Euclidean
and Dynamic Time Warping (DTW). We provide both initial and progressive
estimates of the final answer that are getting better during the similarity
search, as well suitable stopping criteria for the progressive queries.
Moreover, we describe how this method can be used in order to develop a
progressive algorithm for data series classification (based on a k-NN
classifier), and we additionally propose a method designed specifically for the
classification task. Experiments with several and diverse synthetic and real
datasets demonstrate that our prediction methods constitute the first practical
solutions to the problem, significantly outperforming competing approaches.
This paper was published in the VLDB Journal (2022)
Unleashing early maturity academic innovations
The Arab region consists of many teaching-intensive universities that are intrinsically committed to holistic educational excellence. According to a recent UNESCO report, the higher education sector in the Arab region is undergoing a need for massive expansion given exponentially growing populations, record-breaking youth cohorts, coupled with a strong recognition of the economic and social value of higher education. Such an enormous need for growth poses a significant challenge for publicly funded universities yet offers many opportunities for private universities to meet the ever-increasing demands of advanced education. On another front, computing education is trending in the region with a reputation for high market demand, a certain future, and high pay
ProS: data series progressive k-NN similarity search and classification with probabilistic quality guarantees
International audienceExisting systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate ProS, a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We develop our method for k-NN queries and demonstrate how it can be applied with the two most popular distance measures, namely, Euclidean and Dynamic Time Warping (DTW). We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Moreover, we describe how this method can be used in order to develop a progressive algorithm for data series classification (based on a k-NN classifier), and we additionally propose a method designed specifically for the classification task. Experiments with several and diverse synthetic and real datasets demonstrate that our prediction methods constitute the first practical solutions to the problem, significantly outperforming competing approaches
Fault-Tolerant Termination Detection with Safra’s Algorithm
Safra’s distributed termination detection algorithm employs a logical token ring structure within a distributed network; only passive nodes forward the token, and a counter in the token keeps track of the number of sent minus the number of received messages. We adapt this classic algorithm to make it fault-tolerant. The counter is split into counters per node, to discard counts from crashed nodes. If a node crashes, the token ring is restored locally and a backup token is sent. Nodes inform each other of detected crashes via the token. Our algorithm imposes no additional message overhead, tolerates any number of crashes as well as simultaneous crashes, and copes with crashes in a decentralized fashion. Experiments with an implementation of our algorithm were performed on top of two fault-tolerant distributed algorithms
Data Series Progressive Similarity Search with Probabilistic Quality Guarantees
International audienceExisting systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Experiments with synthetic and diverse real datasets demonstrate that our prediction methods constitute the first practical solution to the problem, significantly outperforming competing approaches